Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to determine which variables were dropped *

    I have used this command to drop variables that had more than 50% of their data missing.

    PHP Code:
    glo p 0.5

    foreach var of varlist b-arp * {
    count if missing(`var')
    if (r(N)/_N) >= 
    $p drop `var'

    But it didn't give me the names of the variables that have been dropped.

  • #2
    Noor:
    and it is not expected to do that.
    Just save a copy of your original dataset, use a different copy to run your code and then use the -cf- command to compare the two.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Noor:
      and it is not expected to do that.
      Just save a copy of your original dataset, use a different copy to run your code and then use the -cf- command to compare the two.
      Thanks, Carlo, for your response.

      Did you mean this command is used to compare two datasets? because I am not aware of it.

      And is it helpful when I have 900 variables to determine which ones were dropped?

      Comment


      • #4
        Noor:
        1) see -help cf- and related entry in Stata .pdf manual;
        2) it's up to you, given you research goals, to consider whether that information is helpful or not.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Noor:
          1) see -help cf- and related entry in Stata .pdf manual;
          2) it's up to you, given you research goals, to consider whether that information is helpful or not.
          Thanks a lot. It was really helpful.

          Comment


          • #6
            Carlo has given helpful advice already, using a post-hoc comparison of two datasets. Here is a different approach, in which you first determine if the missing fraction is above a threshold, then if so, report the name of the variable being dropped. You can easily modify this to also report that percentage as well.

            Code:
            local p = 0.5
            
            foreach var of varlist b-arp * {
              summ `var', meanonly
              local pct_missing = (1 - r(N)/_N)    // compute fraction of missing data, and store it in a local macro
              if `pct_missing' >= `p'  {
                display "Dropping: `var'"  // print the variable name, then drop it
                drop `var'
              }
            }

            Comment


            • #7
              Originally posted by Leonardo Guizzetti View Post
              Carlo has given helpful advice already, using a post-hoc comparison of two datasets. Here is a different approach, in which you first determine if the missing fraction is above a threshold, then if so, report the name of the variable being dropped. You can easily modify this to also report that percentage as well.

              Code:
              local p = 0.5
              
              foreach var of varlist b-arp * {
              summ `var', meanonly
              local pct_missing = (1 - r(N)/_N) // compute fraction of missing data, and store it in a local macro
              if `pct_missing' >= `p' {
              display "Dropping: `var'" // print the variable name, then drop it
              drop `var'
              }
              }
              Thank you so much for your help.

              Comment

              Working...
              X